Week 10.6 - Hands-On Activities and Assessment

🎯 What We'll Cover

This is where the week becomes concrete. You have a working definition of agents (10.1), a taxonomy of how they fail (10.2), a map of the tools (10.3), an understanding of agentic RAG (10.4), and an honest free-tier guide (10.5). Now you apply all of it to a research question you actually care about.

The headline activity is “Same Task, Three Ways”: run one question through plain chat, chat-with-tools, and a Deep Research mode, then judge the results with the Week 9 failure taxonomy and verify them with the Week 5 citation checks. Two shorter activities follow, and the week's assessment pulls them together. Everything here is designed to be done entirely on free tools — on a phone or a borrowed laptop if that is what you have — because, as 10.5 argued, a course for this continent cannot assume a subscription.

One last echo of Week 9: every output you produce this week is a snapshot dated May 2026. Part of the assessment is acknowledging that — recording what was true when you did the work, and how soon you would expect it to change.

🧪 Activity 1: Same Task, Three Ways

The core activity. Choose a research question in your own field — specific, genuinely answerable, and not trivially Googleable. Something a knowledgeable colleague would have to think about. You will be the expert judge of the answers, which is the point: you can tell when an AI is wrong in your own area in a way you cannot in someone else's.

Run that one question through three modes, keeping everything else the same:

Plain chat — a free assistant with no tools (Claude.ai, ChatGPT, or Gemini free; pick one and stay with it). The model answers from training alone.
Chat with tools — the same kind of assistant, but with web search / browsing switched on, so it can retrieve before answering.
Deep Research mode — a full agentic-RAG run. Perplexity's free Deep Research is the recommended default. If its quota is exhausted or sign-up is blocked, Kimi (kimi.com) is an acceptable substitute — but see the data-disclosure rule below, which applies whichever tool you choose.

Then write a one-to-two-page comparison. Do not just say which was “best”. For each mode, record: how deep and specific the answer was; how good the citations were (and whether they exist — you will check in Activity 2); what it got wrong in your expert judgement; and where each mode failed. Then do the analytical core of the exercise:

📊 Apply the Week 9.2 taxonomy explicitly

For every failure you observed, classify it: was it patched (you were using a weak tool — a current one would not fail this way), reduced-but-persistent (a known weakness you can manage with better prompting or tool choice), or structural (something the next model release will not fix — long-tail gaps, compositional error, the reliability-not-accuracy problem from 10.2)? Then, for the structural ones, state what the Week 9.5 verification protocol would have you do about it. This is the muscle the whole activity exists to build.

🔍 Activity 2: Verify a Deep Research Output

This follows directly from Activity 1. Take the Deep Research report you generated and put its citations on trial, using the tools you already have:

Run every citation through the Week 5 Five-Point Citation Check (does the paper exist; are the authors and year right; does it say what the report claims; is the venue real; does the link resolve to the right thing).
Apply the Week 9 dated-research check: which model produced this, when, and would the claim survive a retest on a current model?

Then report the numbers: of the citations the Deep Research tool gave you, how many checked out completely? How many pointed to real sources that said something different from the report's claim? How many were to “papers” that do not appear to exist at all? This is the Week 5 hallucinated-citation exercise carried into the agentic-RAG era — and the results are usually sobering, which is exactly the lesson. A fluent, well-formatted, confident research report is not a verified one.

🔌 Activity 3 (Optional): A Small MCP Workflow

For students who want to go further. Using one of Claude.ai's free connectors — the creative connectors, or (since April 2026) the read-only Microsoft 365 connector — wire up one small step of your research workflow, involving nothing personal or confidential. Write 250 words on what worked, what failed, and — most importantly — what you would never let it do unsupervised, and why. The point is not the connector; it is articulating your own permissions dial (10.1) for a real task.

🔒 A note on the Microsoft 365 connector

It needs a business or education Microsoft account (not a personal @outlook.com one), and your institution's IT must allow the connection. There is a simple way to find out whether yours does: just try adding it. If Microsoft shows an “administrator approval required” screen, self-service connection is not enabled for your institution and you would need IT to approve it. Reading that consent screen — and deciding whether you would even want to grant the access it asks for — is itself a useful exercise in the permissions thinking this week is about.

🌐 The Data-Disclosure Rule (Required, Graded)

Every tool in this activity processes your data outside South Africa — the US-based assistants and the Chinese ones alike (10.5). So a disclosure statement is part of the deliverable, not an optional extra. Two firm rules and a template:

No personal or confidential data. Use a research question that involves no identifiable personal information, no participant data, no unpublished third-party material. POPIA (section 72) restricts sending personal information abroad regardless of the destination country, and a free consumer AI tool gives you no basis to assume the recipient meets the bar. Keep the activity to public, non-personal questions.
Disclose what you used and where it went. Naming your tools and their data destinations is a habit worth building now — it is exactly what a research-ethics committee or a journal will increasingly expect.

📄 Disclosure statement template (copy, complete, submit)

“For this exercise I used: [tool 1], [tool 2], [tool 3].
Each processes data outside South Africa, in: [country/region per tool, e.g. United States / China / unknown].
The research question involved no identifiable personal information or third-party confidential material.
I verified the outputs as follows: [Five-Point Citation Check / dated-research check / other].
Outputs are accurate as of [date]; I would expect the tool capabilities and free-tier limits described to change within [estimate].”

📝 The Week 10 Assessment

The assessment is a single piece of roughly 1,500 words, in the same spirit as the Week 9 assessment: an explicitly dated snapshot that acknowledges its own coming obsolescence. Free tools only — a hard rule, so that everyone is judged on the same playing field regardless of what they can afford. Required sections:

Section	What it contains
Tool comparison	The Activity 1 three-way comparison, with concrete observations per mode.
Applied failure taxonomy	Each observed failure classified patched / reduced / structural, with the Week 9.5 action for the structural ones.
Verification audit	The Activity 2 citation check, with the numbers: how many citations held up, how many didn't.
Data-flow disclosure	The completed disclosure statement and a sentence on the POPIA reasoning behind it.
Staleness reflection	What is dated about your findings, and your recommended retest cadence.
“If I had a paid subscription”	One honest paragraph on what you could not do for free, and whether it would have changed your conclusions.

🗺️ Week 10 in One Page

Pulling the week together:

An agent is a model plus a harness — tools, a loop, memory, and permissions — and since 2024 the harness, not the model, increasingly decides how well the whole thing works (10.1).
Reliability is not accuracy. Agents fail in new ways over long horizons; the failures that matter are structural, and the verification burden grows rather than shrinks (10.2).
The tool landscape is real but volatile — coding, computer-use, browser, and research agents, connected increasingly through MCP — and best read with three questions: which harness, how reliable, free from here? (10.3)
RAG has split into long-context, agentic RAG, and Deep Research; the right choice is task-dependent, and the researcher still decides what to trust (10.4).
A great deal is genuinely free from South Africa — Western and Chinese alike — if you know where to look and you treat cross-border data flow seriously for every tool (10.5).

The one idea to keep

Agents change what the tools can do. They do not change who is responsible for the result. Every capability in this week shifts work onto the machine and verification onto you — and the researcher who understands that trade, and keeps the verification, is the one who benefits from agents instead of being misled by them.